Hotspots and super-spreaders: Modelling fine-scale malaria parasite transmission using mosquito flight behaviour

Malaria hotspots have been the focus of public health managers for several years due to the potential elimination gains that can be obtained from targeting them. The identification of hotspots must be accompanied by the description of the overall network of stable and unstable hotspots of malaria, especially in medium and low transmission settings where malaria elimination is targeted. Targeting hotspots with malaria control interventions has, so far, not produced expected benefits. In this work we have employed a mechanistic-stochastic algorithm to identify clusters of super-spreader houses and their related stable hotspots by accounting for mosquito flight capabilities and the spatial configuration of malaria infections at the house level. Our results show that the number of super-spreading houses and hotspots is dependent on the spatial configuration of the villages. In addition, super-spreaders are also associated to house characteristics such as livestock and family composition. We found that most of the transmission is associated with winds between 6pm and 10pm although later hours are also important. Mixed mosquito flight (downwind and upwind both with random components) were the most likely movements causing the spread of malaria in two out of the three study areas. Finally, our algorithm (named MALSWOTS) provided an estimate of the speed of malaria infection progression from house to house which was around 200–400 meters per day, a figure coherent with mark-release-recapture studies of Anopheles dispersion. Cross validation using an out-of-sample procedure showed accurate identification of hotspots. Our findings provide a significant contribution towards the identification and development of optimal tools for efficient and effective spatio-temporal targeted malaria interventions over potential hotspot areas.

Number of days of flight (DoF) was set between 1 to 18 days, ending from the number of days prior to the infection date (e.g. if infection occurred on the 29 th of July and the number of days prior to infection was 10, then the mosquito arrived in the house on the 19 th of July with flight starting from the departing house any day from the 2 nd of July; which means that between the 19 th of July and the 29 th of July the infectious bite and incubation period were completed). The sum of maximum number of days previous to infection and maximum days of flight (DPI+DoF = 39 days) is within the lifespan of Anopheles mosquitoes 6 . MALSWOTS does not model the lifespan of the mosquito and assumes homogenous flight capabilities within the DoF + DPI period. S1 Fig shows how the parameters DPI and DoF operate within MALSWOTS.
Numbers of hours of continuous flight is set at 1h per day (or time of the day period). Tethered flight experiments have shown that An. gambiae and An. atroparvus can continuously fly for 1 to 4 hours, although most flights lasted less than 1 hour 7 .
The maximum distance of flight in one day is 1000 m (therefore any distance between 0 and 1000 m is accepted as day flight). This value reflects the dispersal capacity of An. gambiae, An. arabiensis and An. funestus. Mark-release-recapture studies estimate an average distance between 150m and 500m away from the release location, although many mosquitoes are still found well over 1000m one day post-release [8][9][10][11][12][13] . Here, we do not consider the long-range migratory behaviour identified in Sahelian mosquitoes 14 as we are considering climate and environment at near soil conditions and our primary interest is movement between houses. Long-range dispersion above 1 km in our setting is thus more likely to happen in steps over multiple days instead of single flights.
A large body of literature is available on mosquito's attractants (primarily carbon dioxide) and their capacity to stimulate upwind mosquito movement [15][16][17][18][19][20][21] . However, limited information is available for the distance at which the mosquito can detect the host. Here, we set this distance at 50m according to previous work 22,23 .
Distance around each house (essentially a buffer zone around the house that determines a successful connection) permits outdoor malaria transmission 24 and flexibility in the model. This parameter was fixed at 15 m, lower than those proposed elsewhere 25 to allow for houses separated by as little as 30 m to have a different infection status.
Anopheles flight speed is heterogeneously reported and is generally around 0.3 m/s but can be as high as 1m/s for short periods [26][27][28] . However, these values do not discriminate between mosquito flight and wind propulsion. Therefore, we have set this parameter to the average value of 0.1 m/s. In upwind movement this value of 0.1 m/s is reduced according to the wind speed (e.g. if the wind speed is 0.05 m/s, the upwind flight speed is 0.05 m/s) while in downwind movement the mosquito flight speed adds to the wind speed (e.g. if the wind speed is 0.1 m/s, the downwind flight speed is 0.2 m/s).
High winds can inhibit flight and low numbers of mosquitoes are caught at winds above 40 km per hour 3 . We therefore set the wind limits for all flight at 11 m/s based on limits for upwind flight 2 , although the average hourly wind speed (and gust speed) rarely reached this value within the MMP area.
Finally, wind direction is highly variable 29 and a tolerance around the average direction is often applied in mosquito flight modelling 30 . For the present analysis we set this tolerance to 45 degrees around the hourly average. This value is consistent with the capacity of the mosquitoes to adjust their flight if odour plumes come from a different direction 30 .

MALSWOTS algorithm
As in SWOTS (see 31 for a detailed description of the calculations), MALSWOTS data preparation starts by transforming the wind plumes into mosquito trajectories for the different mosquitoes' movements (by using the conditions stated in mosquito flight parameters section): downwind, upwind, random, downwind and random, and upwind and random; while the potential infections of houses are transformed into infection connections (see MALSWOTS algorithm flow below). Various controls are performed at the beginning and if necessary infected locations that are too close to uninfected locations are removed (i.e. if the infected and uninfected house are within 65 m -the sum of the distance at which the mosquito can detect the host and the distance around each house).
Taking each infected house in turn, from the date of its infection, the connections were created for every other house infected at some future date (up to a maximum of DPI+DoF days). A maximum distance limitation was also applied (see mosquitoes flight parameters). The infection connections form a matrix consisting of rows representing houses and the five following columns: average number of days to potentially infect other houses; mean distance of connections along the longitude; mean distance of connections along the latitude; average connection speed in meters per days; and frequency of connections along the average connection directions. These five columns were converted from a five dimension Euclidian space to a spherical system (which allow the standardisation of the dimensions) 32 . The same matrix is created for each mosquito trajectory. Mosquito trajectories are the sum of geometric vectors, starting from the infected house in the first day and from the endpoint of the geometric vector the subsequent day (the head-to-tail sum rule). Therefore, mosquito trajectories do not necessarily end in an infected or uninfected house; they can end in any point of the geographic area. Their length and direction depend on wind speed and direction as described above. Mosquitoes are assumed to use a variety of movement strategies, up-and down-wind, random and mixed movements, the latter allowing for a random component in their daily flight. Assuming each of these strategies in turn, and a mixture of these strategies, five resulting mosquito trajectories matrices were produced: downwind, upwind, random, mixed downwind and random, and mixed upwind and random. Between each mosquito movement matrix and infection connections matrix, both in spherical space, the vectorial correlation is calculated following the approach proposed by 33 and fully described in 31 .
The process described above was repeated 450 times, each time with a different combination of DoF (up to 18 days) and DPI (up to 21 days). In addition, it was repeated for different time periods: 6.00pm-10.59pm, 11.00pm-3.59am, 4.00am-8.59am, and finally for the entire period studied 6.00pm-8.59am where mosquitoes are assumed flying for the full length of the time period; therefore providing 9000 correlations between infection connections and mosquitoes movements (450 combinations x 4 day periods x 5 mosquitoes movements). The day periods were chosen as representative of the Malawian Anopheles biting pattern 34 . Each calculated correlation was tested via spatial bootstrapping to ensure sensitivity and to reduce error 35 . The spatial bootstrapping allows for the definition of the level of significance, here taken to be 99%.
The (significant) correlations were weighted 36 by the number of surviving mosquitoes (see below for the details of the survival model). The subsequent analysis is applied with the parameters associated to the top correlations (or the theoretical best if preferred). The analysis is re-run to describe mosquito trajectories from house to house and included uncertainty in wind speeds and directions. For the latter, a stochastic component for both wind speed and direction were added. In each simulation, a new value of wind speed and direction for each grid node is sampled. The wind speeds are assumed to be multivariate log-normal distributed, where the mean is a vector containing each grid node's average wind speed for the considered time period (day periods defined above) while the variance is fixed to 1. Assuming a multivariate distribution allows the inclusion of correlations between wind speeds at different grid nodes 37 . The wind directions are assumed to be normally distributed, with the mean at each grid nodes equal to the average wind direction for the considered time period and the variance equal to 1. This normal distribution is wrapped in order to have values between -π and π 38 . It is important to note that each focal area has been converted into grids with each cell containing the values of wind and other environmental variables for that day or time of the day (see below).
At each simulation, from a new (sampled) set of wind speeds and directions, the five types of mosquito trajectories were re-calculated. The total number of simulations was 1,000. For each newly infected house we therefore had a set of connections from different, previously infected houses. The MALSWOTS algorithm flow (modified from 31 ) is provided below.  The probability of infection by the different types of mosquito movement at a house and the cumulative probability of a house escaping infection (a cross-product of the probability of each movement infecting the house) are obtained from the survival of the mosquito at certain environmental conditions (i.e. temperature and rainfall) and time of flight. Mosquito survival is calculated by employing an exponential model with the number of mosquitoes caught in each trap Poisson distributed with the mean proportional to time and rate linearly dependent on temperature and precipitation: where X is the number of mosquitoes caught at time t; time, t, is considered as offset; 0 is the intercept; 1 is the coefficient for precipitation; and 2 is the coefficient for temperature. The survival model was restricted to precipitation and temperature only since relative humidity and soil water content were not found statistically significant predictors when added to the model or in alternative to rainfall and temperature.
These probabilities of survival (and therefore infection) are multiplied by the probability of finding infected houses in the area (grid cell) and by the respective frequency of movements. The latter product is then adjusted by the total number of potential connections (assuming one day is equal to one connection per mosquito movement) and total number of simulations to obtain the final probabilities of (i) a house being infected from all surrounding infectious houses (since one house is likely to be infected by mosquitoes arriving from multiple houses with different movement types), (ii) a house being infected by a specific house, and (iii) a house being infected by a type of mosquito movement (see S2 Fig).
At present, the probability of a house being infected does not account for housing characteristics (number of individuals in the house, house type etc), other than the two climate variables used to model mosquito survival. However, the current framework does allow inclusion of house-dependent factors or additional environmental (including climate) variables if the data is available.